22.02.22 - Machine Learning_ Techniques
Types of Tasks
Supervised Learning - Function from labelled training data, each example consisting of an input and a desired output (Classifications, regression)
Unsupervised Learning - Function to describe hidden structure from unlabelled data. There is basically no output
Classification - Learn to predict to which set a instance belongs to based on pre-labeled (classified) instances Supervised learning, many approaches: regression, decision trees, neural networks
Supervised Learning
Regression
Estimated relationship between variable Y and variable(s) X Function: Based on the given data to minimise its mean squared error (MSE) to 'fit' the data
Decision Tree
Internal Nodes: decision rules on features (decision variables, input) Leaf Nodes: a predicted class label(output) |Pros|Cons| |----|----| |Reasonable training time| Only simple decision boundaries| |Handle large features|Problem with lots of missing data| |Easy to implement|Cannot handle complicated relationship| |Easy to interpret|
Neural Networks
Set of neurons connected by directed weighted edges |Pros|Cons| |----|----| |Cam learn more complicated class boundaries|Hard to implement| |Can be more accurate| Slow training time| |Can handle large number of features|Can over fit the data| | |Hard to interpret|
Supervised Learning: Applications
Clustering
- Given: Un-labeled data set D and Similarity/distance metric
- Goal: Find 'natural' partitioning, or groups of similar data points
- Application: Divide a market into distinct subsets of customers
Association Rules
Discover correlation between any two or more variables
- Given: Set of records containing items
- Goal: Produce dependency rules, to predict occurrence of one variable based on occurrences of another variable
- Application: Market basket analysis
Correlation Causation